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Abstract 



Symbolic event recognition systems have been successfully applied to a variety of ap- 
plication domains, extracting useful information in the form of events, allowing experts or 
other systems to monitor and respond when significant events are recognised. In a typical 
event recognition application, however, these systems often have to deal with a significant 
amount of uncertainty. In this paper, we address the issue of uncertainty in logic-based 
event recognition by extending the Event Calculus with probabilistic reasoning. Markov 
Logic Networks are a natural candidate for our logic-based formalism. However, the tem- 
poral semantics of the Event Calculus introduce a number of challenges for the proposed 
model. We show how and under what assumptions we can overcome these problems. Ad- 
ditionally, we study how probabilistic modelling changes the behaviour of the formalism, 
affecting its key property, the inertia of fluents. Furthermore, we demonstrate the advan- 
tages of the probabilistic Event Calculus through examples and experiments in the domain 
of activity recognition, using a publicly available dataset for video surveillance. 

1. Introduction 

Symbolic event recognition systems have received attention in a variety of application do- 
mains, such as health care monitoring, public transport management, telecommunication 
network monitoring and activity recognition (Luckham, 2002; Paschke Sz Kozlenkov, 2009; 
Artikis et al., 2010a). The aim of these systems is to extract useful information, in the form 
of events, by processing time-evolving data that comes from various sources (e.g. various 
types of sensor, surveillance cameras, network activity logs, etc.). The extracted information 
can be exploited by other systems or human experts, in order to monitor an environment 
and respond to the occurrence of significant events. The input to a symbolic event recog- 
nition system consists of a stream of time-stamped symbols, called simple, derived events 
(SDE). Consider, for example, a video tracking system detecting that someone is walking for 
a sequence of video frames. Based on such time-stamped input SDE observations, the sym- 
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bolic event recognition system recognises composite events (CE) of interest. For instance, 
that some people have started to move together. 

The recognition of CE may be associated with the occurrence of various SDE and 
other CE involving multiple entities, e.g. people, vehicles, etc. CE therefore, are relational 
structures over other sub-events, either CE or SDE. Symbolic approaches, such as the Event 
Calculus (Kowalski & Sergot, 1986; Artikis et al., 2010) and the Chronicle Recognition 
System (Dousson &; Maigat, 2007), can naturally and compactly represent relational CE 
structures. Based on their formal and declarative semantics, they provide solutions that 
allow to easily incorporate and exploit background knowledge, in order to improve the 
accuracy of the event recognition system. 

Symbolic methods, however, cannot handle uncertainty which naturally exists in many 
real-world event recognition applications. Event recognition systems often have to deal 
with data that involves a significant amount of uncertainty (Shet et al., 2007; Artikis et al., 
2010; Gal et al., 2011): (a) Low-level detection systems often cannot detect all necessary 
SDE for CE recognition. Logical definitions of CE, therefore, have to be composed with a 
limited and often insufficient dictionary of SDE. (b) Partial and noisy observations result to 
incomplete and erroneous SDE streams. For example, a sensor may fail for some period of 
time and stop sending any information, interrupting the detection of a SDE. Similarly, noise 
during the signal transmission may distort the observation values, (c) In particular, when 
Machine Learning algorithms are applied, the inconsistencies between SDE streams and CE 
annotations introduce further uncertainty. In the presence of various types of uncertainty, 
logical definitions cannot capture perfectly the conditions under which a CE occurs. 

Based on such imperfect CE definitions, the aim of this work is to recognise CE of 
interest. We propose a probabilistic extension to the Event Calculus, by employing Markov 
Logic Networks (MLN) (Domingos k. Lowd, 2009). The Event Calculus is a formalism for 
representing events and their effects. Beyond the advantages stemming from the fact that it 
is a logic-based formalism with clear semantics, one of the most interesting properties of the 
Event Calculus is that it handles the persistence of CE with domain-independent axioms. 
On the other hand, MLN is a statistical relational framework that combines the expressivity 
of first-order logic with the formal probabilistic properties of undirected graphical models 
— see de Salvo Braz et al. (2008), Raedt and Kersting (2010) and Blocked (2011) for recent 
surveys on logic-based probabilistic models. By combining the Event Calculus with MLN, 
we present a principled and powerful probabilistic logic-based method for event recognition. 

In particular the contributions of this work are the following: 

• A probabilistic Event Calculus, based on MLN, for the task of event recognition. 
The method inherits the domain-independent properties of the Event Calculus and 
supports the probabilistic recognition of CE by imperfect definitions. 

• Efficient representation of the Event Calculus axioms and CE definitions in MLN. The 
method employs a discrete variant of the Event Calculus and translates the entire 
knowledge base into compact Markov networks, in order to avoid the combinatorial 
explosion caused by the expressivity of the logical formalism. 

• A thorough study of the behaviour of CE persistence. Under different conditions 
of interest, the method can model various types of CE persistence, ranging from 
deterministic to purely probabilistic. 
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To demonstrate the benefits of the proposed approach, the method is evaluated in the 
real-Hfc event recognition task of human activity recognition and compared with its crisp 
predecessor. The definitions of CE are domain-dependent rules that are given by humans 
and expressed using the Event Calculus language. The method processes the rules in the 
knowledge base and produces Markov networks of manageable size and complexity. Each 
rule can be associated with a weight value, indicating a degree of confidence in it. Weights 
arc automatically estimated from a training set of examples. The input to the recognition 
system is a sequence of SDE expressed as a narrative of ground predicates. Probabilistic 
inference is used for recognising CE. 

The remainder of the paper is organised as follows. First, in Section 2, we present the 
target activity recognition application, in order to introduce a running example for the rest 
of the paper. In Sections 3 and 4, we provide a brief introduction to the Event Calculus 
and MLN. Then in Section 5, we present the proposed probabilistic extension of the Event 
Calculus. In Section 6, we study the behaviour of the probabilistic formalism. In Section 7 
we demonstrate the benefits of probabilistic modelling, through experiments in the real-life 
activity recognition application. Finally in Sections 8 and 9, we present related work and 
outline directions for further research. 

2. Running Example: Activity Recognition 

To demonstrate our method, we apply it to video surveillance in public spaces using the 
publicly available benchmark dataset of the CAVIAR project^. The aim is to recognise ac- 
tivities that take place between multiple persons, by exploiting information about observed 
individual activities. The dataset comprises 28 surveillance videos, where each frame is 
annotated by human experts from the CAVIAR team on two levels. The first level contains 
simple, derived events (SDE) that concern activities of individual persons or the state of 
objects. The second level contains composite event (CE) annotations, describing the ac- 
tivities between multiple persons and/or objects, e.g. people meeting and moving together, 
leaving an object, etc. In this paper, we focus on the recognition of the meeting and moving 
CE, for which the dataset contains a sufficient amount of training examples. 

The input to our method is a stream of SDE, representing people walking, running, 
staying active, or inactive. We do not process the raw video data in order to recognise such 
individual activities. Wc make the assumption that this input can be provided by specialised 
state-of-the-art methods (e.g. Kosmopoulos et al., 2008). The input stream of SDE is 
represented by a narrative of time-stamped predicates. The first and the last time that a 
person or an object is tracked are represented by the SDE enter and exit. Additionally, the 
coordinates of tracked persons or objects are preprocessed and represented by predicates 
that express qualitative spatial relations, e.g. two persons being relatively close to each 
other. Examples of these predicates are presented in the following sections. 

The definitions of the meeting and moving CE in the Event Calculus were developed in 
(Artikis, Skarlatidis, &; Paliouras, 2010b). These definitions take the form of common-sense 
rules and describe the conditions under which a CE is starting or ending. For example, 
when two persons are walking together with the same orientation, then moving starts being 

1. http://homepages.inf .ed.ac.uk/rbf/CAVIARDATAl 



3 



Skarlatidis a., Paliouras G., Artikis A. & Vouros G. 



Predicate 



Meaning 



initially p{F) 
initially m{F) 
holdsAt{F, T) 
happens{E, T) 
initiates{E, F, T) 
terminates [E ^ F, T) 
clipped {F, To, Ti) 
declipped{F , Tq, Ti) 



Fluent F holds at time-point 

Fluent F docs not hold at time-point 

Fluent F holds at time-point T 

Event E occurs at time-point T 

Event E initiates fluent F at time-point T 

Event E terminates fluent F at time-point T 

Fluent F is terminated at some time-point in the interval [Tq, 

Fluent F is initiated at some time-point in the interval [Tq, Ti 



Table 1: The Event Calculus predicates in classical logic. 



recognised. Similarly, when the same persons walk away from each other, then moving stops 
being recognised. 

Based on the input stream of SDE and the CE definitions, the aim is to recognise 
instances of the two CE of interest. Due to the presence of uncertainty the CE definitions 
are imperfect. As a result, the definitions do not lead to perfect recognition of the CE. 

3. The Event Calculus 

The Event Calculus, originally introduced by Kowalski and Sergot (1986), is a many-sorted 
first-order predicate calculus for reasoning about events and their effects. A number of differ- 
ent dialects have been proposed using either logic programming or classical logic (Shanahan, 
1999; Miller &: Shanahan, 2002). Most Event Calculus dialects share the same ontology and 
core domain-independent axioms. The ontology consists of time-points, events and fluents. 
The underlying time model is often linear and may represent time-points as real or integer 
numbers. A fluent is a property whose value may change over time. When an event occurs 
it may change the value of a fluent. The core domain-independent axioms deflne whether a 
fluent holds or not at a specific time-point. Moreover, the axioms incorporate the common 
sense law of inertia, according to which fluents persist over time, unless they are affected 
by the occurrence of some event. 

We base our model on an axiomisation of the Event Calculus in classical first-order 
logic. As a starting point, we use a subset of the Full Event Calculus, proposed by Shana- 
han (1999). For simplicity and without loss of generality the predicate releases is excluded. 
This predicate is domain-dependent and defines under which conditions the law of inertia 
for a fiuent is disabled. All fluents, therefore, are subject to inertia. Table 1 summarises the 
elements of the Event Calculus. Variables (starting with an upper-case letter) are assumed 
to be universally quantified unless otherwise indicated. Predicates, functions and constants 
start with a lower-case letter. In this work we consider finite domains of time-points, events 
and fluents, that are represented by the flnite sets T, S- and respectively. Similarly, all 
individual entities that appear in a particular event recognition task, e.g. persons, objects, 
etc., are represented by the constants of the finite set O. 

The domain-independent axioms of the Event Calculus are presented below, where 
F e T, E e 6 and T, Tq, Ti G T. Specifically, the axioms that determine when a flu- 
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ent holds are defined as follows: 

holdsAt{F, T) <= 

initially p{F) A (1) 
^dipped{F, 0, T) 

holdsAt{F, T) <= 

happens {E, To) A 

initiates{E, F, Tq) A (2) 

To < r A 

^clipped{F, To, T) 

According to axiom (1), a fluent holds at time T if it hold initially and has not been 
terminated in the interval [0,T). According to axiom (2), a fluent holds at time T if it was 
initiated at some earlier time Tq and has not been terminated between Tq and T. 
The axioms that determine when a fluent does not hold, are defined below: 

^holdsAt{F, T) 4= 

initially m{F) A (3) 
^declipped{F, 0, T) 

^holdsAt{F, T) <^ 

happens (E, Tq) A 

terminates{E, F, Tq) A (4) 
To < T A 

^declipped{F , Tq, T) 

Axiom (3) defines that a fluent does not hold at time T if it did not hold initially and has 
not been initiated in the interval [0, T). Axiom (4) defines that a fluent does not hold at 
time T if it was terminated earlier at To and has not been initiated between Tq and T. 
The auxiliary domain-independent predicates clipped and declipped are defined below: 

clipped {F , To, Ti) <J4> 

3E, T happens{E, T) A , . 

Tq < T A T < Ti A ^ ' 

terminates {E ^ F, T) 

declipped{F , Tq, Ti) 4^ 

3 E, T ha-pperis{E. T) A 

To < T A T < Ti A ^ ' 

initiates{E, F, T) 

According to axiom (5), a fluent is clipped in [To,Ti) when the occurrence of an event 
terminates the fluent in that interval. In the same manner, axiom (6) defines that a fluent 
is declipped in [To, Ti) when the occurrence of an event initiates the fluent in that interval. 

The predicates happens, initiates and terminates are defined only in a domain-dependent 
manner, happens expresses the input evidence, determining the occurrence of a SDE at 
a specific time-point. A stream of observed SDE, therefore, is represented in the Event 
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Calculus as a narrative of ground happens predicates. As an example, consider the following 
fragment of a narrative: 

happens (walking (idi), 99) 
happens {walking {id2), 99) 
happens{walking{idi), 100) 
happens{walking{id2), 100) 

happens {active{idi), 500) 
happens {active {id2), 500) 

According to the above narrative, it has been observed that two persons idi and id2 are 
walking, e.g. at time-points 99 and 100, and later at time-point 500 they are active, i.e. they 
are moving their arms but staying at the same position. 

The predicates initiates and terminates specify under which circumstances a fluent — 
representing a CE — is to be initiated or terminated at a specific time-point. In our example 
application, for instance, the moving activity of two persons is terminated when both of 
them are active. This termination case can be represented using the following rule: 

terminates{active{IDi), moving{IDi , ID2), T) <= , . 

happens {active{ID 2), T) 

Based on a narrative of SDE and a knowledge base composed of domain-dependent 
CE definitions (e.g. rule (7)) and the domain-independent Event Calculus axioms (l)-(6), 
we can infer whether a fluent holds or not at any time-point. When a fluent holds at a 
specific time-point, then the corresponding CE is considered as recognised. For example, 
the moving CE between persons idi and idg is recognised at time-point 100 by inferring that 
holdsAt{moving{idi , id2), 100) is True. Similarly, the moving CE for the same persons is 
not recognised at time-point 501 by inferring that holdsAt{moving{isdi , id2), 501) is False. 

4. Mctrkov Logic Networks 

Although the Event Calculus can compactly represent complex event relations, it does not 
handle uncertainty adequately. A knowledge base of Event Calculus axioms and composite 
event (CE) definitions is defined by a set of first-order logic formulas. Each formula imposes 
a (hard) constraint over the set of possible worlds, that is, Herbrand interpretations. A 
missed or an erroneous simple, derived event (SDE) detection can have a significant effect 
on the event recognition results. For example, an initiation may be based on an erroneously 
detected SDE, causing the recognition of a CE with absolute certainty. 

We employ the framework of Markov Logic Networks^ (MLN) (Domingos &; Lowd, 2009) 
in order to soften these constraints and perform probabilistic inference. In MLN, each 

2. Systems implementing MLN reasoning and learning algorithms can be found in the following links: 
http : / /alchemy . cs . Washington . edu 
http : / /research . cs . wise . edu/hazy/ tuf f y 
http : / /code . google . com/p/ thebeast 
http : //ias . cs . turn. edu/probcog-wiki 
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formula Fi is represented in first-order logic and is associated with a weight value Wi G M. 
The higher the value of weight wi^ the stronger the constraint represented by formula Fi. 
In contrast to classical logic, all worlds in MLN are possible with a certain probability. 
The main idea behind this is that the probability of a world increases as the number of 
formulas it violates decreases. A knowledge base in MLN may contain both hard and soft- 
constrained formulas. Hard-constrained formulas are associated with an infinite weight value 
and capture the knowledge which is assumed to be certain. Therefore, an acceptable world 
must at least satisfy the hard constraints. Soft constraints capture imperfect knowledge in 
the domain, allowing for the existence of worlds in which this knowledge is violated. 

Formally, a knowledge base L of weighted formulas together with a finite domain of 
constants C is transformed into a ground Markov network M^fi , where L consists of Event 
Calculus axioms and CE definitions, and C=TU 0{J£{JT. All formulas are converted into 
clausal form and each clause is ground according to the domain of its distinct variables. 
The nodes in M^^ are Boolean random variables V , each one corresponding to a possible 
grounding of a predicate that appears in L. The predicates of a ground clause form a clique 
in Mifi. Each clique is associated with a corresponding weight Wi and a Boolean feature, 
taking the value 1 when the ground clause is true and otherwise. The ground M^fi defines 
a probability distribution over possible worlds and is represented as a log-linear model. 

In event recognition we aim to recognise CE of interest given the observed streams of 
SDE. For this reason we focus on discriminative MLN (Singla & Domingos, 2005), that 
are similar to Conditional Random Fields (Lafferty, McCallum, k, Pereira, 2001; Sutton & 
McCallum, 2007). Specifically, the set of random variables in Mj^fi can be partitioned into 
two subsets. The former is the set of evidence random variables X, formed by a narrative of 
input ground happens predicates and preprocessed spatial constraints. The latter is the set 
of random variables Y that correspond to groundings of query holdsAt predicates, as well as 
groundings of any other hidden/unobserved predicates. The joint probability distribution 
of a possible assignment of Y=y, conditioned over a given assignment of X=x, is defined 
as follows: 



The vectors x G and y G 3^ represent a possible assignment of evidence X and query/hidden 
variables Y , respectively. X and y are the sets of possible assignments that the evidence X 
and query/hidden variables Y can take. Fc is the set of clauses produced from the knowledge 
base L and the domain of constants C. The scalar value Wi is the weight of the «-th clause 
and ni(x, y) is the number of satisfied groundings of the i-th clause in x and y. Z{yi) is 
the partition function, that normalises over all possible assignments y' G 3^ of query/hidden 

variables given the assignment x, that is, 2^(x) = 'Yliy'i=y^^P^Cl2i y'))- 

Equation (8) represents a single exponential model for the joint probability of the entire 
set of query variables that is globally conditioned on a set of observables. Such a conditional 
model can have a much simpler structure than a full joint model, e.g. a Bayesian Network. 
By modelling the conditional distribution directly, the model is not affected by potential 
dependencies between the variables in X and can ignore them. The model also makes 
independence assumptions among the random variables Y , and defines by its structure the 
dependencies of Y on X. Furthermore, conditioning on a specific assignment x, given by the 
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observed SDE, reduces significantly the number of possible worlds and inference becomes 
much more efficient (Singla & Domingos, 2005; Minka, 2005; Sutton & McCallum, 2007). 

Directly computing equation (8) is intractable, because the value of Z{x) depends on the 
relationship among all clauses in the knowledge base. For this reason, a variety of efficient 
inference algorithms have been proposed in the literature, e.g. based on local search and 
sampling (Poon & Domingos, 2006; Singla & Domingos, 2006; Biba et al., 2011), variants 
of Belief Propagation (Singla &: Domingos, 2008; Kersting, Ahmadi, & Natarajan, 2009), 
based on Integer Linear Programming (Riedel, 2008) , etc. In this work we perform marginal 
inference to compute the P{holdsAt{CE, T)=True\ SDE), which is the probability of a 
CE to hold and thus considered as recognised, given a narrative of observed SDE. To 
compute this probability we employ the state-of-the-art sampling algorithm MC-SAT (Poon 
&; Domingos, 2006). Even in large state spaces the algorithm can efficiently approximate this 
probability, by combining Markov Chain Monte Carlo sampling with satisfiability testing. 

The weights of the soft-constrained clauses can be estimated from training data by 
minimising the negative conditional log-likelihood function derived from equation (8), using 
either first-order or second-order optimisation methods (Singla Sz Domingos, 2005; Lowd 
& Domingos, 2007; Huynh & Mooney, 2009, 2011). First-order methods apply standard 
gradient descent optimisation techniques, e.g. the voted perceptron algorithm (Collins, 2002; 
Singla &: Domingos, 2005), while second-order methods pick a search direction based on the 
quadratic approximation of the target function. As stated by Lowd and Domingos (2007), 
no single learning rate is appropriate for all clauses, since in a training set some clauses can 
have a significantly greater number of satisfied groundings than others. This situation causes 
the standard gradient descent methods to converge very slowly. To avoid this problem, we 
have chosen to use the second-order Diagonal Newton algorithm (Singla Sz Domingos, 2005). 

5. Probabilistic Event Calculus 

To perform probabilistic inference MLN ground the entire knowledge base, including the 

domain-independent Event Calculus axioms. For example, axiom (1) produces one clause 
and has two distinct variables F and T. Therefore, the number of its groundings is de- 
termined by the Cartesian product of the corresponding variable-binding constants, that 
is |J^|x|T|. Assuming that the domain of fluents T is relatively small compared to the 
domain of time-points T, the number of groundings of axiom (1) grows linearly to the 
number of time-points. Axioms (5) and (6) are triply quantified over time-point variables 
(To, Ti and T) and therefore, the number of their groundings has a cubic relation to the 
number of time-points. In addition, variables E and T are existentially quantified. During 
MLN grounding, existentially quantified formulas arc replaced by the disjunction of their 
groundings (Domingos & Lowd, 2009). This leads to clauses with a large number of dis- 
junctions and a combinatorial explosion of the number of clauses that are generated from 
axioms (5) and (6). Therefore, representing axioms (l)-(6) directly in MLN is not practical 
for real-world event recognition, as these axioms lead to an unmanageably large Markov 
network. 
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5.1 Representation 

To eliminate the triply quantified axioms, a discrete version of the Event Calculus can be 
used instead. The Discrete Event Calculus (DEC) has been proved to be logically equivalent 
to the Event Calculus when the domain of time-points is limited to integers (Mueller, 2008). 
The original DEC^ is composed of twelve domain-independent axioms. However, for the 
task of event recognition, we focus only on the domain-independent axioms that determine 
the influence of events to fluents and the inertia of fluents. We do not consider the predicates 
and axioms stating when a fluent is not subject to inertia {releases and releasedAt), as well 
as its discrete change based on some domain-specific mathematical function (trajectory and 
anti Trajectory) . 

The DEC axioms that determine when a fluent holds are defined as follows: 

holdsAt{F, T+1) <= 

happens (E, T) A 
initiates{E, F, T) 

holdsAt{F, T+1) ^ 

holdsAt{F, T) A 
-i3 E happens{E , T) A 
terminates{E, F, T) 

According to axiom (9), when an event E that initiates a fluent F occurs at time T, the 
fluent holds at the next time-point. Axiom (10) implements the inertia of fluents, dictating 
that a fluent continues to hold unless an event terminates it. 

The axioms that determine when a fluent does not hold, are defined similarly: 

-^holdsAt{F, T+l) 4= 

happens {E, T) A 
terminates {E, F, T) 

^holdsAt{F, T+l) 4= 

-^holdsAt{F, T) A 
-i3 E happens {E, T) A 
initiates {E, F, T) 

Axiom (11) states that when an event E that terminates a fluent F occurs at time T, then 
the fluent does not hold at the next time-point. Axiom (12) specifies that a fiuent continues 

not to hold unless an event initiates it. 

Compared to the Event Calculus presented in Section 3, DEC axioms arc defined over 
successive time-points. Axioms (9)~(12) are quantified over a single time-point variable. 
As a result, the number of ground clauses is substantially smaller than in the full Event 
Calculus. Axioms (10) and (12), however, still contain the existentially quantified variable 
E over a conjunction of predicates. Each of these axioms will be transformed into 2l^l 
clauses, each producing \ J^\ x |T| groundings^. Moreover, each ground clause will contain a 
large number of disjunctions, causing large cliques in the ground Markov network. 

3. http : //decreasoner . sourcef orge . net 

4. Mueller (2008) uses the technique of sub-formula renaming (Nonncngart & Weidenbach, 2001) in order to 
avoid the creation of 2'^' clauses. However, the existential quantification remains in the axioms, causing 
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Predicate Meaning 

happens {E, T) Event E occurs at time-point T 

holds At{F , T) Fluent F holds at time-point T 

initiatedAt{F , T) Fluent F is initiated at time-point T 

terminatedAt{F , T) Fluent F is terminated at time-point T 



Table 2: The probabilistic Event Calculus predicates. 



In order to eliminate the existential quantification and reduce further the number of vari- 
ables, we adopt a similar representation to that of Artikis et al. (2010), where the initiation 
and termination predicates — represented by the predicates initiatedAt and terminatedAt 
respectively — are only defined in terms of fluents and time-points. By using this repre- 
sentation, the domain-independent axioms of our MLN-based Event Calculus (MLN-EC) 
are defined only in terms of the universally quantified fiuents and time-points. Table 2 
summarises the elements of the proposed MLN-EC, where F G J^, E £ £ and T G T. 

The MLN-EC axioms that determine when a fluent holds are deflned as follows: 



(13) 



holdsAt{F, T+l) <= 

initiatedAt {F , T) 

holdsAt{F, T+l) <= 

holdsAt{F, T) A (14) 
^terminatedAt{F , T) 

Axiom (13) defines that if a fluent F is initiated at time T, then it holds at the next 
time-point. Axiom (14) specifles that a fluent continues to hold unless it is terminated. 
The axioms that determine when a fluent does not hold are deflned similarly: 

^holdsAt{F, T+l) <^ 

terminatedAt{F, T) ^ ' 

^holdsAt{F, T+l) 4= 

^holdsAt{F, T) A (16) 
^initiatedAt{F , T) 

Axiom (15) defines that if a fluent F is terminated at time T then it does not hold at 
the next time-point. According to axiom (16), a fluent continues not to hold unless it is 
initiated. 

According to this representation a domain-dependent rule, e.g. the initiation and/or 
termination of some fluenti over some entities X and Y takes the following general form: 



the creation of large cliques in the ground network. Additionally, sub- formula renaming introduces utility 
predicates that do not belong in the input evidence, creating hidden variables in the network. As a result, 
the complexity of inference and learning increases. 
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initiatedAt{fluenti{X , Y), T) <^ 

happens{eventi{X), T) A ... A 
holds At{fluentj{X),T) A ... A 
Conditions[X , Y, T] 

terminatedAt{fluenti{X , Y), T) 

happens{eventk{X), T) A ... A 
holdsAt{fluentiiX),T) A ... A 
CondiUons[X , Y, T] 

Conditions[X , Y, T] is a set of predicates that introduce further constraints in the defini- 
tion, referring to time T eT and entities X, Y E O. The predicates happens and holdsAt, 
as well as those appearing in CondifAons[X , Y, T], may also be negated. The initiation and 
termination of a fluent can be defined by more than one rule, each capturing a different 
initiation and termination case. With the use of happens predicates, we can define a CE 
over SDE observations. Similarly, with the holdsAt predicate, we can define a CE over other 
CE, in order to create hierarchices of CE definitions. In both initiatedAt and terminatedAt 
rules, the use of happens^ holdsAt and Conditions[X , Y, T] is optional and varies according 
to the requirements of the target event recognition application. 

Consider the following definition of the meeting CE between two persons in our running 
example. 

initiatedAt {meeting {ID 1 , ID2), T) <^ 

happens{active{IDi), T) A , > 

^happens {running {ID 2), T) A 
close{IDi, ID2, 25, T) 

initiatedAt{ni,eeting{IDi , ID2), T) <= 

happens {inactive{IDi), T) A 

-^happens {running {ID 2), T) A (19) 
-^happens {active{ID 2), T) A 
close{IDi, ID2, 25, T) 

terminatedAt{meeting{IDj , ID2), T) 

happens{walking{IDi), T) A (20) 
^close{IDi, ID2, 34, T) 

terminatedAt{meeting{IDi , ID2), T) <^ , , 

happens {running {ID j), T) 

terminatedAt{meeting{IDi , ID2), T) <^ 
happens {exit{ID 2), T) 

The predicate close expresses a spatial constraint stating that the distance between per- 
sons IDi and ID2 at time T must be below a specified threshold in pixels, e.g. 25 pixels. 
According to rules (18) and (19), the meeting activity is initiated when the people involved 
interact with each other, i.e. at least one of them is active or inactive, the other is not 
running, and the measured distance between them is at most 25 pixels. The meeting CE is 
terminated cither when people walk away from each other (rule 20), or someone is running 
(rule 21), or has exited the scene (rule 22). 
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The definition of the CE that people are moving together is represented as follows: 

initiatedAt {moving {ID 1 , ID2), T) <s= 

happens {walking {ID 1), T) A 

happens {walking {ID 2), T) A 
orientationMove{IDi, ID2, T) A 
close{IDi, ID 2, 34, T) 

terminatedAt{moving{IDj , ID2), T) ■<= 

happens{walking{IDi), T) A 
^close{IDi, ID2, 34, T) 

terminatedAt {moving {ID 1 , ID2), T) <^= 
happens{active{IDi), T) A 
happens{active{ID2), T) 

terminatedAt{moving{IDj , ID2), T) <^= 
happens {active{IDi), T) A 

happens {inactive {ID 2), T) 

terminatedAt{moving{IDi , ID2), T) <= 
happens {running {ID 1), T) 

terminatedAt{moving{IDi , ID2), T) <^ 
happens{exit{IDi), T) 

The predicate orientationMove is a spatial constraint, stating that the orientation of two 
persons is almost the same (e.g. the difference is below 45 degrees). According to rule 
(23), the moving CE is initiated when two persons IDi and ID2 are walking close to each 
other (their distance is at most 34 pixels) with almost the same orientation. The moving 
CE is terminated under several cases: (a) As specified by rule (24), when people walk 
away from each other, i.e. they have a distance larger than 34 pixels, (b) When none is 
actually moving, i.e. both are staying active, or (c) one is active while the other is inactive, 
represented by rules (25) and (26). (d) Finally, when one of them is running or exiting the 
scene, represented by rules (27) and (28), respectively. 

5.2 Knowledge Base Compilation 

A knowledge base with domain-dependent rules in the form of (17) describes explicitly the 
conditions in which fluents are initiated or terminated. It is usually impractical to define 
also when a fluent is not initiated and not terminated. However, the open-world semantics 
of first-order logic result to an inherent uncertainty about the value of a fiuent for many 
time-points. In other words, if at a specific time-point no event that terminates or initiates 
a fluent happens, we cannot rule out the possibility that the fluent has been initiated or 
terminated. As a result, it cannot be determined whether a fluent holds or not, causing the 
loss of inertia. 

This is a variant of the well-known frame problem and one solution for the Event Calculus 

in first-order logic is the use of circumscription (McCarthy, 1980; Lifschitz, 1994; Shanahan, 
1997; Doherty et al., 1997; Mueller, 2008). The aim of circumscription is to automatically 
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rule out all those conditions which arc not explicitly entailed by the given formulas. Hence, 
circumscription introduces a closed- world assumption to first-order logic. 

Technically, we perform circumscription by predicate completion — that is a syntac- 
tic transformation where formulas are translated into logically stronger ones. In par- 
ticular, we perform a knowledge compilation procedure in which predicate completion is 
computed for both initiatedAt and terminatedAt predicates. Due to the form of CE def- 
initions (sec formalisation (17)), the result of predicate completion is specialised for ev- 
ery CE separately, i.e. mitiatedAt{meeting{ID i , ID2), T) as opposed to initiatedAt (F , T). 
Similar to Mueller (2008), we also eliminate the initiatedAt and terminatedAt predicates 
from the knowledge base, by exploiting the equivalences resulting by predicate comple- 
tion. In cases where the definitions of the initiation or termination of a specific CE are 
missing, the corresponding initiation or termination is considered False for all time-points, 
e.g. terminatedAt {fluent {X , Y), T) -i^ False. 

To illustrate the form of the resulting knowledge base, consider the domain-dependent 
definition of meeting — i.e. rules (18)-(22). After predicate completion, these rules will be 
replaced by the following formulas: 



The resulting rules (29) and (30) define all conditions under which the meeting CE is 
initiated or terminated. Any other event occurrence cannot affect this CE, as it cannot 
initiate the CE or terminate it. Based on the equivalence in formula (29), the domain- 
independent axiom (13) is automatically re-written into the following specialised form: 



initiatedAt{meeting{IDi , ID2), T) 



{happens{active{IDi), T) A 
^happens[running{ID 2) , T) A 
dose{IDi, ID 2, 25, T)) V 
{happens{inactive{IDi), T) A 
^happens [running {ID 2), T) A 
^happens{active{ID2), T) A 
close{IDi, ID 2, 25, T)) 



(29) 



terminatedAt{meeting{IDi , ID2), T) 




(30) 



holds At{meeting {ID 1, ID2), T+1) <^ 



happens {active{IDi), T) A 
^happens {running {ID 2), T) A 
close{IDi, ID 2, 25, T) 



holdsAt{meeting{IDi , ID2), T+1) <s= 



(31) 



happens{inactive{IDi), T) A 
-^happens {running {ID 2) , T) A 
-^happens{active{ID2), T) A 
close{IDi, ID2, 25, T) 
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Similarly, the inertia axiom (14) can be re-written according to (30) as follows: 



(32) 



holdsAt{meeting{IDi, ID2), T+l) <= 

holdsAt{meeting{IDi , ID2), T) A 
[happens {walking {ID i), T) A 

^close{IDi, ID2, 25, T) ) V 
happens{running{IDi), T) \J 

happens { exit{IDi ) , T) ^ 

The result of this compilation procedure replaces the original set of domain-independent 

axioms and domain-dependent CE definitions with a logically stronger knowledge base. The 
rules in the resulting knowledge base form the template that MLN will use to produce ground 
Markov networks. This process produces considerably more compact ground Markov net- 
works, as the clauses to be ground are reduced. Moreover, the predicates initiatedAt and 
terminatedAt are eliminated and the corresponding random variables are not added in the 
network. This reduction decreases substantially the space of possible worlds, since the 
target random variables of the network {Y in equation (8)) are limited only to the corre- 
sponding holdsAt ground predicates. Specifically, the space of possible worlds is reduced 
from 2^^l-^l^l^l to 21-^1^1^1 — where |T| and denote the number of distinct time-points 
and fluents, respectively. These reductions improve the computational performance of the 
probabilistic inference. Furthermore, due to the reduced space of possible worlds, the same 
number of sampling iterations results in better probability estimates. 

Formally, the resulting knowledge base is composed of rules having the following form: 



holdsAt{fluenti{X, Y),T+1) ^ 

happens {eventi{X), T) A ... A Conditions[X , Y, T] 



nholdsAt{fluenti{X, Y), T+l) <= 

happens{eventj{X), T) A ... A Conditions[X , Y, T] 



(33) 
(34) 



'' holds At {fluent 1 {X , Y), T+l) 4= 
holds At{fluenti{X, F), T) A 



^( ( happens{eventj{X), T) A ... A Conditions[X , Y, T] ) V ■ • • ) 

^holdsAt{fluenti{X, Y), T+l) <= 
-^holdsAt{fluenti{X, Y), T) A 

-i( ( happens {eventi{X), T) A ... A Conditions[X , Y, T] ) V • • • ) 



(35) 



(36) 



The rules in (33)-(36) can be separated into two subsets. The former set E contains 
specialised definitions of axioms (13) and (15), specifying when a fluent holds (or does not 
hold) when its initiation (or termination) conditions are met. The latter set S contains 
specialised definitions of the inertia axioms (14) and (16), specifying whether a specific 
fluent continues to hold or not at any instance of time. 
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The knowledge compilation procedure reduces the size of the produced network, based 
only on the rules of the knowledge base. Given a narrative of SDE further reduction can be 
performed during the ground network construction. All ground predicates that appear in the 
given narrative are replaced by their truth value. Ground clauses that become tautological 
are safely removed, as they remain satisfied in all possible worlds (Singla & Domingos, 2005; 
Shavlik & Natarajan, 2009). Therefore, the resulting network comprises only the remaining 
ground clauses, containing ground predicates with unknown truth states — i.e. groundings 
of holdsAt. 

6. The Behaviour of Probabilistic Event Calculus 

As mentioned in Section 4, weighted formulas in MLN define soft constraints, allowing 
some worlds that do not satisfy these formulas to become likely. For example, consider a 
knowledge base of Event Calculus axioms and CE definitions (e.g. meeting and moving) 
compiled in the form of rules (33)-(36). Given a narrative of SDE, the probability of a CE 
to hold at a specific time-point is determined by the probabilities of worlds in which this 
CE holds. Each world has some probability which is proportional to the sum of the weights 
of the ground clauses that it satisfies. Consequently, the probability of a CE to hold at a 
specific instance of time depends on the corresponding constraints of the ground Markov 
network. By treating the rules in the S and S' sets as either hard or soft constraints, the 
behaviour of the Event Calculus changes. 

6.1 Soft-constrained rules in S 

In order to illustrate how the probability of a CE is affected when its initiation or termination 
conditions are met, consider the case that the rules in S are soft-constrained while the 
inertia rules in E' remain hard-constrained. By soft-constraining the rules in S, the worlds 
violating their clauses become probable. This situation reduces the certainty with which 
a CE is recognised when its initiation or termination conditions are met. For example, 
consider that the initiation rules (31) of the meeting CE are associated with weights. As 
a result, the meeting activity is initiated with some certainty, causing the CE to hold with 
some probability. Depending on the strength of the weights, the worlds that violate these 
rules become more or less likely. Thus, we can control the level of certainty with which a 
CE holds or not under the same conditions. 

When the initiation conditions are met, the probability of the CE to hold increases. 
Equivalently, when the termination conditions arc satisfied, the probability of the CE de- 
creases. At the same time, all worlds violating hard-constrained inertia rules in S' are 
rejected. In the presence of SDE leading to the partial satisfaction (i.e. satisfaction of pos- 
sibly empty strict subset) of the initiation/termination conditions, the probability of a CE 
to hold is not affected. The inertia is deterministically retained as in crisp logic. In Figure 
1, for instance, the fluent meeting does not hold at time 0. According to the narrative of 
SDE, the meeting activity is initiated at time-points 3 and 10, e.g. satisfying the constraints 
imposed by rules (18) and (19) respectively. At time 20, the meeting activity is terminated 
by the conditions of rule (20). In crisp Event Calculus, denoted as EC crisp-, after its first 
initiation the meeting activity holds with absolute certainty. The second initiation at time 
10 does not cause any change and the CE continues to hold. The termination at time 20 
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Figure 1: The probability of the meeting CE given some SDE narrative. EC crisp is a crisp 
Event Calculus. MLN-EC hi is a probabilistic Event Calculus where rules in S 
are soft-constrained, while the inertia rules in S' remain hard-constrained. 



causes the CE to not hold, again with absolute certainty, for the remaining time-points. In 
MLN-EC HI (hard-constrained inertia rules), however, the rules in S are soft-constrained. 
As a result, at time 4 the probability of meeting to hold increases to some value. Similar to 
EC crisp , the inertia is fully retained and the probability of meeting deterministically persists 
in the interval 4 to 10. In contrast to ECcrisp, the second initiation at time 10 increases the 
certainty of meeting to hold. As a result, the probability of meeting is higher in the interval 
11 to 20. In the same manner, the termination at 20 reduces the probability of meeting and 
the CE continues to hold with some reduced probability. 

6.2 Soft-constrained inertia rules in S' 

To illustrate how the behaviour of inertia is affected by soft-constraining the corresponding 
rules in S', consider that the rules in S are hard-constrained. Consequently, when the 
initiation (or termination) conditions are met a CE holds (or does not hold) with absolute 
certainty. In the case of SDE leading to the partial satisfaction of the initiation/termination 
conditions, the probability of a CE depends only on its inertia rules in S'. If the inertia 
of holdsAt remains hard-constrained, the worlds in which an initiated CE does not hold 
are rejected. Similarly, by keeping the inertia of ^holdsAt hard-constrained, all worlds in 
which a terminated CE holds are rejected. By soft-constraining these rules we control the 
strength of the inertia constraints. In the presence of SDE leading to the partial satisfaction 
of the corresponding initiation/termination conditions, a CE may not persist with absolute 
certainty, as worlds that violate these constraints become likely. The persistence of holdsAt 
and ^holdsAt is gradually lost over successive time-points. When allowing the constraints 
of holdsAt inertia to be violated, the probability of a CE gradually drops. Similarly, by 
allowing the constraints representing the inertia of ^holdsAt to be violated, the probability 
of a CE gradually increases. The lower the value of the weight on the constraint, the more 
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(a) The CE initially holds. 
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(b) The CE initially does not hold. 



Figure 2: In both figures SDE occur leading to the partial satisfaction of the initia- 
tion/termination conditions of a CE in the interval to 100. In the left figure 
the CE holds at time with absolute certainty, while in the right figure the CE 
does not hold at time 0. 



probable the worlds that violate the constraints become. In other words, weight values in 
T,' cause CE to persist for longer or shorter time periods. 

Since the sum of the probabilities of holdsAt and -^holdsAt for a specific CE is always 
equal to 1, the relative strength of holds At and ^holdsAt rules in E' determines the type of 
inertia in the model. The following two general cases can be distinguished. 

Equally strong inertia constraints. All rules in S' are equally soft-constrained, i.e. they 
are associated with the same weight value. Consequently, both inertia rules of holdsAt and 
^holdsAt for a particular CE impose constraints of equal importance, allowing worlds that 
violate them to become likely. As a result, in the absence of useful evidence, the probability 
of holdsAt will tend to approximate the value 0.5. For example, Figure 2(a) illustrates 
soft persistence for the meeting CE when it holds with absolute certainty at time-point 0, 
and thereafter nothing happens to initiate or terminate it. The curve MLN-EC si^i (soft- 
constrained inertia rules with equal weights) shows the behaviour of inertia in this case. As 
time evolves, the probability of meeting appears to gradually drop, converging to 0.5. If we 
assign weaker weights to the inertia axioms, shown by the MLN-EC oj^q curve, the proba- 

weak 

bility of meeting drops more sharply. Similarly, in Figure 2(b), the meeting CE is assumed 
to not hold initially. As time evolves, the probability of meeting gradually increases up to 
the value 0.5, as shown by the MLN-EC si'^i and MLN-EC Qj^q curves respectively. 

weak 

Inertia constraints of different strength. When the inertia rules of holdsAt and 
^holdsAt for a particular CE in T,' have different weights, the probability of the CE will 
no longer converge to 0.5. Since the weights impose constraints with different confidence, 
worlds violating the stronger constraints become less likely than worlds violating the weaker 
ones. Depending on the relative strength of the weights, the probability of the CE may con- 
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verge cither to 1.0 or 0.0. The relative strength of the weights affects also the rate at which 
the probability of CE changes. As an extreme example, in Figure 2(a), the rules for the iner- 
tia of ^holdsAt remain hard-constrained. By assigning weights to the rules for the inertia of 
holdsAt, the persistence of the CE is lost. Since the inertia constraints of holdsAt are weaker 
than the constraints of -^holdsAt, worlds violating the former set of constraints will always 
be more likely. As a result, the probability of the CE will continue to drop, even below 0.5. 
The curves MLN-EC gjh (soft-constrained inertia of holdsAt) and MLN-EC ojh (weaker 

^ w eak 

holdsAt inertia constraints) illustrate how the probability of meeting drops sharply towards 

0. 0. The weaker the constraints {MLN-EC gjh ) the steeper the drop. In a similar manner, 

weak 

when the inertia constraints of —iholdsAt are weaker than the constraints of holdsAt, the 
probability of CE gradually increases and may reach values above 0.5 — presented by the 
MLN-EC gj-.h (soft-constrained inertia of ^holdsAt) and MLN-EC gj^h (weaker ^holdsAt 

weak 

inertia constraints) cases in Figure 2(b). 

As explained in Section 5.2, the inertia rule of a specific CE may consist of a large 
body of conditions, e.g. rule (32). Depending on the number of conditions involved, the 
inertia rule of a specific CE may be decomposed into several clauses, each corresponding to 
a different subset of conditions. For instance, the following two clauses are added to E' by 
the inertia rule (32): 

happens {walking {ID i), T) V happens {running {ID i) , T) V happens{exit{ID i) , T) V , . 
^holdsAt{meeting{IDi , IDs), T) V holdsAt {meeting {ID i , ID2), T+l) ^ ' 

^close{IDi, ID2, 25, T) V happens {running {ID 2), T) V happens{exit{IDi), T) V , . 
^holdsAt{meeting{IDi , IDs), T) V holdsAt {meeting {ID 1 , ID2), T+l) ^ ' 

The above clauses contain literals from the termination rules of the meeting CE. Often, 
when SDE that lead to the partial satisfaction of the initiation/termination conditions 
occur, some of these clauses become trivially satisfied. For example, at time-point 10 both 
persons ID^ and ID2 are active, while their distance is above the threshold of 25 pixels, 

1. e. close{IDi, ID2, 25, 10)=False. Consequently, the grounding of clause (38) at time- 
point 10 remains trivially satisfied for all possible worlds. Although the meeting CE is not 
terminated at time-point 10, because clause (37) is not satisfied, the satisfaction of clause 
(38) reduces the probability of holdsAt for the meeting CE. This is because the inertia at 
time-point 10 is now supported only by the satisfaction of the ground clause (37). In other 
words, the difference between the probabilities of worlds that violate the inertia of holdsAt 
and worlds that do not, is reduced. 

To illustrate this phenomenon, consider the example cases in Figure 3(a) where only the 
rules about the inertia of holdsAt are soft-constrained. Both MLN-EC gjh and MLN-EC'gjh 
cases share the same knowledge base. In the MLN-EC gjh case, the occurrence of SDE causes 
none of the inertia clauses to become trivially satisfied. In the MLN-EC'^ji, case, however, 
the SDE are randomly generated and cause a different subset of inertia clauses to become 
trivially satisfied at each time-point. In both cases the probability of the CE is reduced. In 
contrast to MLN-EC gjh , however, the inertia in MLN-EC'gjh drops more sharply, as some 
of the clauses in S' are trivially satisfied by the given SDE. Additionally, the probability 
of the CE to hold in MLN-EC' gju persists at a different level in each time-point, since 
different subsets of clauses become trivially satisfied each time. Similarly, in Figure 3(b) 
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Figure 3: In both figures no useful SDE occur in the interval to 100. In both MLN-EC gjh 
and MLN-EC gj^h none of the inertia clauses in S' become trivially satisfied by 
the SDE. However, in MLN-EC'gj^ and MLN-EC'^j^^ some inertia clauses are 
trivially satisfied by the SDE. In the left figure the CE holds at time with 
absolute certainty, while in the right figure the CE does not hold at time 0. 



the rules about the inertia of ^holdsAt are soft-constrained. In contrast to MLN-EC gj^h, 
the occurrence of SDE leads to the partial satisfaction of the initiation conditions causing 
the inertia in MLN-EC'gj^,, to persist with different confidence in each time-point, increasing 
the probability of the CE to hold more sharply. 

Having analysed the effect of softening the inertia rules, it is worth noting that in many 
real cases the entire knowledge base may be soft-constrained. In this case, since the rules in 
S are soft-constrained, CEs are not being initiated or terminated with absolute certainty. At 
the same time, CEs do not persist with certainty, as the rules in T,' are also soft-constrained. 

Depending on the requirements of the target application, various policies regarding the 
soft-constraining of the knowledge base may be adopted. This flexibility is one of the 
advantages of the proposed method. Furthermore, it should be stressed that in a typical 
event recognition application the knowledge base will contain a large number of clauses. 
The strength of a constraint imposed by a clause is also affected by the weights of other 
clauses with which it shares the same predicates. Due to these interdepencies, the manual 
setting of weights is bound to be suboptimal and cumbersome. Fortunately, the weights 
can be estimated automatically from training sets, using standard parameter optimisation 
methods. 
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7. Evaluation 
7.1 Setup 

In this section we demonstrate MLN-EC in the domain of video activity recognition.^ 
Based on imperfect CE definitions, tlie aim of the experiments is to assess the effective- 
ness of MLN-EC in recognising CE tliat occur among people, given a sequence of SDE 
as evidence. In the experiments we use the open-source MLN framework Alchemy (Kok 
et al., 2005). For comparison purposes, we also include in the experiments a logic program- 
ming Event Calculus activity recognition method (Artikis et al., 2010b), which we call here 
EC crisp- Both EC crisp and MLN-EC contain a similar knowledge base of CE definitions. 
In particular, the CE definitions of meeting and moving of ECcrisp are translated into first- 
order logic using the formulation proposed in Section 5 and are associated with weights. 
The definition of meeting is given by formulas (18)-(22), while that of moving is given by 
formulas (23)-(28). 

The input to both MLN-EC and ECcrisp consists of a time-stamped sequence of SDE, 
i.e. active, inactive, walking, running, enter and exit, in the form of a narrative of ground 
happens predicates. In MLN-EC, the spatial constraints close and orientationMove are 
precomputed and provided as input. 

The output of both methods consists of a sequence of ground holdsAt predicates, in- 
dicating which CE are recognised. ECcrisp performs crisp reasoning and thus all CE are 
recognised with absolute certainty. On the other hand, MLN EC performs marginal in- 
ference to compute the conditional probability P(holdsAt{CE , T) = True \ SDE), using the 
MC-SAT algorithm (Poon & Domingos, 2006). Consequently, all recognised CE are associ- 
ated with some probability value. 

MLN-EC is trained discriminitavely and its performance is evaluated by 10-fold cross- 
validation. From the 28 videos of the CAVIAR dataset, we have extracted 19 sequences 
that are annotated with the meeting and/or moving CE. The rest of the sequences in the 
dataset are ignored, as they do not contain examples of the two target CE. Out of 19 
sequences, 8 are annotated with both moving and meeting activities, 9 are annotated only 
with the moving and 2 only with m,eeting CE. As shown in Table 3, each training sequence 
consists of input SDE (ground happens), precomputed spatial constraints between pairs of 
people (ground close and orientationMove) , as well as the corresponding CE annotations 
(ground holds At). Negated predicates in the training sequence state that the truth value of 
the corresponding predicate is False. The total length of all 19 sequences is 12869 frames. 
Each frame is annotated with the occurrence or not of a CE leading to a total of 25738 
annotated example instances. The training data, therefore, contain example instances that 
represent when a CE holds or not at a specific time-point. There are 6272 example instances 
in which moving holds and 3622 in which meeting holds. Using the knowledge compilation 
method that was presented in Section 5.2, predicates that do not explicitly appear in the 
training set, i.e. the predicates initiatedAt and terminatedAt, are removed. As a result, 
the training of MLN-EC is fully supervised and performed by the second-order Diagonal 
Newton method (Lowd &; Domingos, 2007). 

5. The CE definitions and the MLN formatted version of the dataset can be found at: http://www.iit. 
demokritos . gr/~ciiiskarl/pub/mlnec 
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Simple Derived Events 



Composite Events 



happens{walking{idi), 100) 
happens {walking (id2), 100) 
orientationMove{idi , id2, 100) 
close{idi, id2, 24, 100) 

happens{active{idi), 101) 
happens {walking (id2), 101) 
orientationMove{idi , id2, 101) 
^close{idi, id2, 24, 101) 

happens {walking {id i), 200) 
happens {running {id2), 200) 
^orientationMove{idi , id2, 200) 
-^close{idi, id2, 24, 200) 



holdsAt{moving{idi , id2), 100) 
holds At{moving{id2, idi), 100) 



holdsAt{moving{idi , id2), 101) 
holdsAt{moving{id2, idi), 101) 



^holdsAt{moving{idi , id2), 200) 
^holdsAt{moving{id2, idi), 200) 



Table 3: Example training set for CE moving. 



Scenarios 



Description 



All inertia rules in S' are hard-constrained. 

The inertia rules of holdsAt are soft-constrained, while rest of S' 
remain hard-constrained. 

MLN-EC SI All inertia rules in Tl are soft-constrained. 



MLN-ECm 
MLN-ECgjh 



Table 4: General configurations for the inertia rules in S'. 



7.2 Results 

In the following experiments, MLN-EC is tested under three different scenarios {MLN-EC hi , 
MLN-EC gjh and MLN-EC si, see Table 4 for a description), in which the rules in E are 
soft-constrained while the inertia rules in S' are either soft or hard. The evaluation results 
are shown in Figures 4, 5 and 6, in terms of Fi score, precision and recall respectively, for 
threshold values that range between 0.1 and 0.9, where any CE with probability above the 
threshold is considered to be recognised. A snapshot of the performance of MLN-EC is 
also presented in Tables 5(a) and 5(b), in terms of true positives (TP), false positives (FP), 
false negatives (FN), precision, recall and Fi, using the threshold value 0.5. All reported 
experiment statistics are micro-averaged over the instances of recognised CE. 

EC crisp achieves a similar Fi score for both activities (Figure 4). However, in terms of 
precision and recall the situation is quite different, revealing two different cases of imperfect 
CE definitions. The precision for moving is 22 percentage points higher than that of meeting. 
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Figure 4: Fi scores using different threshold values for the moving and meeting CE. 
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Figure 5: Precision scores using different threshold values for the moving and meeting CE. 



The opposite holds for recall, with the recall for moving being 21.6 percentage points lower 
than that of meeting. The lower recall values for moving indicate a larger number of 
unrecognised moving activities (FN). In some example sequences moving appears to be 
initiated late, producing many false negatives. Additionally, the termination rules of moving 
cause the CE to be prematurely terminated in some cases. For example, when the distance 
of two persons that are moving together becomes greater than 34 pixel for a few frames, 
rule (24) terminates moving. On the other hand, compared to moving, the definition of 
meeting results in a larger number of erroneously recognised meeting activities (FP). The 
initiation rule (19), for example, causes the meeting activity to be initiated earlier than it 
should. Another issue caused by the definitions of meeting and moving is that these CE 
may overlap. According to rules (18)-(28), the initiation of moving does not cause the 
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Figure 6: Recall scores using different threshold values for CE moving and meeting. 
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Table 5: Results for the moving and meeting CE for the MLN-EC hi , MLN-EC gjh and 
MLN-EC SI cases compared to EC crisps using threshold 0.5. 



termination of meeting. Consider, for example, a situation where two people meet for a 
while and thereafter they move together. During the interval in which moving is detected, 
meeting will also remain detected, as it is not terminated and the law of inertia holds. 
However, according to the annotation of the CAVIAR team these activities do not happen 
concurrently. 

In the first MLN-EC scenario, indicated by MLN-EC hi ^ each rule in S is soft-constrained, 
i.e. it is associated with a weight value after training. Those weights control the certainty 
with which a CE holds when its initiation or termination conditions are satisfied. The rules 
in S', however, remain hard-constrained and thus the behaviour of inertia for both CE 
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remains deterministic and cannot be adjusted. Compared to EC crisp-, MLN-EC hi achieves 
better Fi score for the moving CE, for most threshold values (see Figure 4(a)), including 
0.5 (Table 5(a)). The recall of moving (Figure 6(a)) is generally higher for MLN-EC hi 
than ECcrisp, at the cost of lower precision (Figure 5(a)). For threshold 0.5, the recall 
of MLN-EC is higher by 17.7 percentage points than ECcrisp while precision is lower by 
6 points, leading the Fi score to be higher for MLN-EC by 8 points (Table 5(a)). This 
improvement in recall is caused by the weights that arc learned for the termination condi- 
tions, which prevent the moving CE from terminating prematurely. However, MLN-EC hi 
performs worse than ECcrisp in the case of meeting, as shown in Figure 4(b) and Table 
5(b). The combination of hard-constrained inertia rules with the fact that meeting does 
not terminate when moving starts, pushed the weights of the initiation rules to very low 
values during training. This situation results to many unrecognised meeting activities and 
low precision and recall values (Figures 5(b) and 6(b)). 

In the MLN-EC gjh scenario, while E remains soft-constrained, the inertia rules of 
holds At in S' are also soft-constrained. As a result, the probability of a CE tends to 
decrease, even when the required termination conditions are not met and nothing else is 
happening. This scenario is more suitable for our target activity recognition task and 
MLN-EC learns a model with a high Fi score for both CE. In order to explain the effect 
of soft constraining the inertia of holds At, we will use again the example of meeting being 
recognised and thereafter moving being also recognised. Since meeting is not terminated, it 
continues to hold and overlaps with moving. During the overlap, all occurring SDE are irrel- 
evant with respect to meeting and cannot cause any initiation or termination. As a result, 
the recognition probability of meeting cannot be reinforced, by re-initiation. As shown in 
Section 6, in such circumstances the recognition probability of meeting gradually decreases. 
The performance of MLN-EC gjh for the CE moving is very similar to MLN-EC hi, as shown 
in Figure 4(a) and Table 5(a). Using a threshold equal to 0.5, recall is improved by 19.5 
percentage points while precision falls by 6.3 points, resulting to 8.9 points increase in Fi 
measure. For the CE meeting, however, the performance of MLN-EC gjh is significantly 
better than MLN-EC hi- It achieves higher precision than ECcrisp at the same level of 
recall. At the 0.5 threshold value, precision increases by 6.8 percentage points over ECcrisp, 
while recall falls by only 1 point and thus Fi is higher by 3.5 points. 

Finally, in the MLN-EC si scenario, the entire knowledge base is soft-constrained. The 
weights in S allow full control over the confidence that a CE holds when its initiation or ter- 
mination conditions are met. Additionally, by soft-constraining the rules in E', MLN-EC si 
provides fully probabilistic inertia. However, this flexibility comes at the cost of an increase 
to the number of parameters to be estimated from data, as all clauses in the knowledge base 
are now soft-constrained. As a result, MLN-EC si requires more training data. In terms 
of Fi score, MLN-EC si performs almost the same as ECcrisp, but worse than MLN-EC sih 
for both CE. 

The three scenarios presented above, MLN-EC hi, MLN-EC gjh and MLN-EC si, illus- 
trated the benefits to be gained by softening the constraints and performing probabilis- 
tic inference in event recognition. In contrast to ECcrisp, an important characteristic of 

MLN-EC is that multiple successive initiations (or terminations) can increase (or decrease) 
the recognition probability of a CE. By softening the CE definitions, premature initiation 
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or termination can be avoided. As explained above, the weights learned for the termination 
definitions of the moving CE reduced the number of unrecognised moving activities. 

The choice of rules to be softened affects significantly the event recognition accuracy. In 
the presented application, for example, the MLN-EC gjh setting is clearly the best choice, 
as softening the inertia of holdsAt provides advantages over crisp recognition. Depending 
on the target application different types of inertia rules may be softened, varying the in- 
ertia behaviour from deterministic to complete probabilistic. This is a key feature of our 
approach. 

8. Related work 

8.1 Symbolic Event Recognition 

Symbolic methods can naturally and compactly represent composite event (CE) definitions 
for event recognition — a list of applications are outlined by Artikis ct al. (2010a). Their for- 
mal, declarative semantics supports the representation of complex relations between events, 
including the definition of hierarchical CE, as well as various temporal constraints such as 
concurrency, persistency, synchronicity, sequences or arbitrary non-sequential constraints, 
e.g. interval algebra relations (Allen, 1983). The expressive symbolic representations are 
combined with powerful reasoning methods in order to recognise the CE of interest. 

The chronicle recognition system (CRS) (Dousson &; Maigat, 2007), for example, is a 
symbolic event recognition method that has been applied to a variety of problems, such as 
cardiac arrhythmia recognition (Callens et al., 2008), computer network monitoring (Dous- 
son, 1996; Dousson & Maigat, 2007), airport ground traffic monitoring (Choppy et al., 
2009), etc. CE are represented using a declarative temporal language and translated into 
temporal constraint networks (TCN) in order to perform efficient event recognition. A TCN 
is a directed symbolic network, where vertices correspond to instantiations of some event 
(CE or SDE) and edges represent temporal delays imposed between the involved events. 
Similarly, in the domain of video interpretation, the scenario recognition method (Vu et al., 
2003) translates CE definitions into TCN. However, for reasons of efficiency the networks 
are automatically decomposed into several sub-networks. The CRS language , has also been 
translated into Petri-Nets which are executed to recognise CE (Choppy et al., 2009). 

In the domain of activity recognition, human activities are recognised with the use of 
context-free grammars (Ryoo Sz Aggarwal, 2006), while a logic-programming approach has 
been presented by Shet et al. (2005). In the latter method, CE take the form of common- 
sense logic programming rules. One important problem with these approaches is their 
inadequate handling of the temporal aspects of the task — e.g. there are no rules to handle 
persistence of CE in a general way or to impose temporal constraints over intervals. A more 
expressive hierarchical event representation for video analysis is proposed by Hakeem and 
Shah (2007). The temporal relations between the sub-events of CE are represented with 
the use of the interval algebra (Allen, 1983). 

A different approach has been adopted by Paschke and Kozlenkov (2009) and Artikis 
et al. (2010, 2010b), using the Event Calculus for event recognition. The benefit of using 
the Event Calculus is that it provides a logic-based language for compactly representing 
CE definitions, as well as axioms that express the domain-independent temporal properties 
of CE, e.g. the common sense law of inertia. Moreover, it can take advantage of efHcient 
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reasoning methods based on logic programming (Artikis et al., 2012), satisfiability testing 
(Mueller, 2008) or answer set programming (Kim et al., 2009; Lee & Palla, 2012). 

Symbolic methods cannot handle uncertainty which naturally exists in many real-world 
applications and may seriously compromise event recognition accuracy. We combined the 
Event Calculus with the framework of Markov Logic Networks. The proposed method 
inherits the properties of the Event Calculus, while supporting probabilistic event recogni- 
tion. Unlike the crisp Event Calculus, our method provides control over the confidence of a 
CE being initiated or terminated, as well as the behaviour of its persistence, ranging from 
deterministic to completely probabilistic. Consequently, each CE is recognised with some 
probability instead of absolute certainty. 

8.2 Approaches based on Probabilistic Graphical Models 

Probabilistic graphical models have been successfully applied to a variety of event recogni- 
tion tasks where a significant amount of uncertainty exists. Since event recognition requires 
the processing of streams of time-stamped SDE, numerous event recognition methods are 
based on sequential variants of probabilistic graphical models, such as Hidden Markov 
Models (HMM) (Rabiner & Juang, 1986), Dynamic Bayesian Networks (DBN) (Murphy, 
2002) and Conditional Random Fields (CRF) (Lafferty et al., 2001). Compared to symbolic 
methods, such models can naturally handle uncertainty but their propositional structure 
provides limited representation capabilities. To overcome this limitation, graphical models 
have been extended to model interactions between multiple entities, (Brand et al., 1997; 
Gong & Xiang, 2003; Wu et al., 2007; Vail et al., 2007), capture long-term dependencies 
between states (Hongeng & Nevatia, 2003), as well as model the hierarchical composition 
of events (Natarajan & Nevatia, 2007; Liao et al., 2005). However, the lack of a formal 
representation language makes the definition of structured CE complicated and the use of 
background knowledge very hard. 

Recently, statistical relational learning (SRL) methods have been applied to event recog- 
nition. These methods combine logic with probabilistic models, in order to represent com- 
plex relational structures and perform reasoning under uncertainty. Using a declarative 
language as a template, SRL methods specify probabilistic models at an abstract level. 
Given an input stream of SDE observations, the template is partially or completely instan- 
tiated, creating lifted or propositional graphical models on which probabilistic inference is 
performed (de Salvo Braz et al., 2008; Raedt &; Kersting, 2010). 

For example, HMM have been extended in order to represent states and transitions using 
logical expressions (Kersting et al., 2006; Natarajan et al., 2008). In contrast to standard 
HMM, the logical representation allows the model to represent compactly probability distri- 
butions over sequences of logical atoms, rather than propositional symbols. Similarly, DBN 
have been extended using first-order logic (Manfredotti, 2009; Manfredotti et al., 2010). A 
tree structure is used, where each node corresponds to a first-order logic expression, e.g. a 
predicate representing a CE, and can be related to nodes of the same or previous time 
instances. Compared to their propositional counterparts, the extended HMM and DBN 
methods can compactly represent CE that involve various entities. 

Our method is based on Markov Logic Networks (MLN), which is a more general and 
expressive model. The knowledge base of weighted first-order logic formulas in MLN defines 
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an arbitrarily structured undirected graphical model. Therefore, MLN provide a generic 
SRL framework which subsumes various graphical models, e.g. HMM, CRF, etc., and can be 
used for expressive logic-based formalisms, such as the Event Calculus. The inertia axioms 
of our method allow the model to capture long-term dependencies between events. Addi- 
tionally, being based on a discriminative model the method avoids common independence 
assumptions over the input SDE. 

Markov Logic Networks have been used for event recognition in the literature. Biswas 
et al. (2007) combine the information provided by different low-level classifiers with the 
use of MLN, in order to recognise CE. Tran and Davis, Kembhavi et al. (2008, 2010) take 
into account the confidence value of the input SDE, which may be due to noisy sensors. A 
more expressive approach that can represent persistent and concurrent CE, as well as their 
starting and ending points, is proposed by Helaoui et al. (2011). However, that method has 
a quadratic complexity to the number of time-points. 

Morariu and Davis (2011) proposed an MLN-based method that uses interval relations. 
The method determines the most consistent sequence of CE, based on the observations of 
low-level classifiers. Similar to (Tran & Davis, 2008; Kembhavi et al., 2010) the method ex- 
presses CE in first-order logic, but it employs temporal relations that are based on Interval 
Algebra (Allen, 1983). In order to avoid the combinatorial explosion of possible intervals, as 
well as to eliminate the existential quantifiers in CE definitions, a bottom-up process elim- 
inates the unlikely CE hypotheses. That process can only be applied to domain-dependent 
axioms, as it is guided by the observations and the Interval Algebra relations. A differ- 
ent approach to interval-based activity recognition, is the Probabilistic Event Logic (PEL) 
(Brendel et al., 2011; Selman et al., 2011). Similar to MLN, the method defines a log-linear 
model from a set of weighted formulas, but the formulas are represented in Event Logic 
(Siskind, 2001). Each formula defines a soft constraint over some events, using interval re- 
lations that are represented by the spanning intervals data structure. The method performs 
inference via a local-search algorithm (based on MaxWalkSAT of Kautz et al. 1997), but 
using the spanning intervals it avoids grounding all possible time intervals. In our work, we 
address the combinatorial explosion problem in a more generic manner, through the effi- 
cient representation of the domain-independent axioms. Additionally, we use a compilation 
procedure to further simplify the structure of the produced network. The compilation is 
performed at the level of knowledge base and is independent of the input SDE. 

Sadilek and Kautz (2012) employ hybrid-MLN (Wang Sz Domingos, 2008) in order to 
recognise successful and failed interactions between humans, using noisy location data from 
GPS devices. The method uses hybrid formulas that dcnoise the location data. Hybrid for- 
mulas are defined as normal soft-constrained formulas, but their weights are also associated 
with a real-valued function, e.g. the distance of two persons. As a result, the strength of the 
constraint that a hybrid rule imposes is defined by both its weight and function — e.g. the 
closer the distance, the stronger the constraint. The weights are estimated from training 
data. However, the method does not employ any formalism for representing the events 
and their effects and thus it uses only domain-dependent CE definitions. On the other 
hand, the use of a hybrid approach for numeric constraints is an interesting alternative to 
discretisation that we plan to study in the future. 
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8.3 Other Approaches on Uncertainty Reasoning 

There are also methods that handle uncertainty without relying on probabilistic graphical 
models. Shet ct al. (2007), for example, proposed an activity recognition method that is 
based on logic programming and handles uncertainty using the Bilattice framework (Gins- 
berg, 1988). The knowledge base consists of domain-specific rules, expressing CE in terms 
of SDE. Each CE or SDE is associated with two uncertainty values, indicating a degree 
of information and confidence respectively. The underlying idea of the method is that the 
more confident information is provided, the stronger the belief about the corresponding CE 
becomes. Another logic-based method that recognises user activities over noisy or incom- 
plete data is proposed by Filippaki et al. (2011). The method recognises CE from SDE 
using rules that impose temporal and spatial constraints between SDE. Some of the con- 
straints in CE definitions are optional. As a result, CE can be recognised from incomplete 
information, but with lower confidence. The confidence of a CE increases when more of the 
optional SDE are recognised. Due to noisy or incomplete information the recognised CE 
may be logically inconsistent with each other. The method resolves those inconsistencies 
using the confidence, duration and number of involved SDE. In contrast to these meth- 
ods, our work employs MLN that have formal probabilistic semantics, as well as an Event 
Calculus formalism to represent complex CE. 

A related approach that we have developed in parallel is that of Filippou et al. (2012). 
The method employs an Event Calculus formalism that is based on probabilistic logic pro- 
gramming and handles noise in the input data. Input SDE are assumed to be independent 
and are associated with detection probabilities. The Event Calculus axioms and CE defini- 
tions in the knowledge base remain hard-constrained. Given a narrative of SDE, a CE may 
be recognised with some probability. Any initiation or termination caused by the given SDE 
increases or decreases the probability of a CE to hold. However, due to the closed-world 
semantics of logic programming, the inertia of a CE is restricted to be deterministic. Our 
approach also does not make any independence assumption about the input SDE. 

9. Conclusions 

We addressed the issue of imperfect CE definitions that stems from the uncertainty that 
naturally exists in event recognition. We proposed a probabilistic extension of the Event 
Calculus based on Markov Logic Networks (MLN). The method has declarative and for- 
mal (probabilistic) semantics, inheriting the properties of the Event Calculus. We placed 
particular emphasis on the efficiency and effectiveness of our approach. By simplifying the 
axioms of the Event Calculus, as well as following a knowledge compilation procedure, the 
method produces compact Markov networks with reduced complexity. Consequently, the 
performance of probabilistic inference is improved, as it takes place on a simpler model. 
MLN-EC provides a choice of CE persistence, ranging from deterministic to probabilistic, 
in order to meet the requirements of different applications. Due to the use of MLN, the 
method lends itself naturally to learning the weights of event definitions from data, as the 
manual setting of weights is sub-optimal and cumbersome. MLN-EC is trained discrimina- 
tively, using a supervised learning technique. Finally, the experimental evaluation showed 
that MLN-EC outperforms its crisp equivalent on a benchmark data. 
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There are several directions in which we would like to extend our work. In many 
applications the input SDE observations are accompanied by a degree of confidence, usually 
in the form of probability. Therefore, we consider extending our method in order to exploit 
data that involves such confidence values, either in the form of additional clauses (e.g. Tran 
& Davis, 2008; Morariu & Davis, 2011), or by employing different inference algorithms (e.g. 
Jain & Beetz, 2010). Furthermore, we would like to address the problems that involve 
numerical constraints by adopting a hybrid-MLN (e.g. Sadilek & Kautz, 2012) or a similar 
approach. We also consider extending our formalism in order to support temporal interval 
relations, using preprocessing techniques (e.g. Morariu k, Davis, 2011), or by employing 
different representation and inference methods (e.g. Brendel et al., 2011; Selman et al., 
2011). Finally, we would like to examine structure learning/refinement methods for the CE 
definitions, since they may not be easy to acquire from experts. 
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